{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# Process Handbook"
   ]
  },

  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "This notebook processes the example handbook (CAHRC_HR_Manual.txt).  This is done in a simple fashion using the following heuristic: If a line of text consisting of less than 5 words is followed by paragraphs of text the assume the line of text with less than 5 words is a topic (i.e. the topic of a question an employee might ask) and that the paragraphs of text are the answer to that question (called action_text for the lack of a better term).\n",
    "\n",
    "When a topic and action_text are found these are stored in Cloud Datastore as a key-value pair with the topic as the key and the action_text as the value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "!pip uninstall -y google-cloud-datastore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "!pip install --upgrade google-cloud-datastore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "!pip install six==1.14.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hit **Reset Session** > **Restart**, then resume with the following cells. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "from google.cloud import datastore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "datastore_client = datastore.Client()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "employee_handbook = open('CAHRC_HR_Manual.txt', 'r')\n",
    "while True:\n",
    "  \n",
    "  topic = employee_handbook.readline()\n",
    "  if not(topic):\n",
    "    break\n",
    "  \n",
    "  if (topic != '\\r\\n') and (len(topic.split(' ')) < 5):\n",
    "  \n",
    "    action_text = ''\n",
    "        \n",
    "    last_line = ''\n",
    "    line = employee_handbook.readline()\n",
    "    \n",
    "    while (last_line != '\\r\\n') and (line != '\\r\\n') and (len(line.split(' ')) > 5):\n",
    "      \n",
    "      action_text += line\n",
    "      last_line = line\n",
    "      line = employee_handbook.readline()\n",
    "      \n",
    "    if action_text != '':\n",
    "      \n",
    "      kind = 'Topic'\n",
    "      topic_key = datastore_client.key(kind, topic.strip().lower())\n",
    "\n",
    "      topic = datastore.Entity(key=topic_key)\n",
    "      topic['action_text'] = action_text\n",
    "\n",
    "      datastore_client.put(topic)\n",
    "\n",
    "      print('Saved {}: {}'.format(topic.key.name, topic['action_text']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
